The easiest way to read data to R, is to use a relevant statbank and find the data you need. Do calculations and pivot your data to a presentable table. When ready, save your query and store the query-id somewhere safe. The table can be imported to your r-script with:
sq_data <- read_csv(“https://bank.stat.gl:443/sq/< query-id >.csv”,locale = locale(encoding = “latin1”))
Out-of-the-box Pxweb offers information on selected variables/values in a saved query by adding ‘?select’. But no information on added editing. Also if the metadata in the table, the saved query is based upon, has changed, Pxweb often encounters an error, with no help offered.
Query-id example:
https://bank.stat.gl:443/sq/8fb0941c-3579-4848-a488-6a9afe4266ff
With error:
https://bank.stat.gl:443/sq/8fb0941c-3579-4848-a488-6a9afe42lars
StatGreenland has added a simple ‘sqget’-asp function to bank.stat.gl, that allow one to get information on any existing saved query:
https://bank.stat.gl/sqget.asp?8fb0941c-3579-4848-a488-6a9afe42lars
##########################################
# Full code for sqget.asp, stored in pxweb-root folder
#
##########################################
# <%@ Language=VBScript %>
# ' utf8.asp Written by Hans Kalle for Fisz
# <!--#include file="utf8.asp"-->
# <%
# on error resume next
#
# callstring= split(request.servervariables("QUERY_STRING"),"/")
# teststring= ucase(callstring(UBound(callstring)))
#
# Dim objStream, uttext
# Set objStream = CreateObject("ADODB.Stream")
# objStream.CharSet = "utf-8"
# objStream.Open
# objStream.LoadFromFile(Server.Mappath("/App_Data/Queries/" & testString & ".pxsq"))
# response.write objStream.ReadText()
# objStream.Close
# %>
For more control and deeper integration Pxweb-based statbanks offers a standard api to be consumed by many machine-languages. We now turn to R:
By october 2021 there are two free R resources to read pxweb-based statbanks via api. (‘pxR’ reads local pcaxis-files only. Denmark and Ireland does not use pxweb-out-of-the-box)
https://cran.r-project.org (package repository):
pxweb
PxWebApiData (SSB.no)
csodata (only CSO.ie)
Github:
statgl (stat.gl)
dkstat (only DST.dk)
#====================================================================================
# General purpose & install
#
#====================================================================================
library(tidyverse)
## Install or update packages on cran:
# install.packages("pxweb")
# install.packages("PxWebApiData")
# install.packages("csodata")
## Install or update packages on GitHub:
# if(!require("devtools")) install.packages("devtools")
# library("devtools")
#
# install_github("rOpenGov/dkstat")
# devtools::install_github("StatisticsGreenland/statgl")
General interface to all pxweb based Statbanks.
Last updated 2021-10-09
Highlight: Use pxweb_interactive to find relevant table(s) from one of 28 Statbanks and have ready to run r-script generated
In example 1 the pxweb package is used to get data from bank.stat.gl
#====================================================================================
# Example 1: pxweb (cran)
# Magnusson M, Kainu M, Huovari J, Lahti L (2019). “pxweb: R tools for PX-WEB API.”
#====================================================================================
library(pxweb)
#pxweb_interactive()
data_df_pxweb <- pxweb_get_data(url = "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE80/BEXCALC.PX",
query = list("year of birth" = "*",
gender = c("M", "K"),
"triangles(Lexis)" = "*",
event = "*",
time = "*"))
General interface to all pxweb based Statbanks.
Last updated 2021-10-11
In example 2 the PxWebApiData package is used to get data from statbank.hagstova.fo
#====================================================================================
# Example 2: PxWebApiData (cran)
# Statistics Norway, Øyvind Langsrud <oyl at ssb.no>
#====================================================================================
library(PxWebApiData)
meta <- ApiData("https://statbank.hagstova.fo:443/api/v1/en/H2/DEV/COH/Lexis.px",
returnMetaFrames = TRUE)
names(meta)
## [1] "year of birth" "event" "sex" "Triangles(Lexis)"
## [5] "year"
meta[[2]]$values
## [1] "P" "B" "I" "O" "D" "C" "U"
data <- PxWebApiData::ApiData("https://statbank.hagstova.fo:443/api/v1/en/H2/DEV/COH/Lexis.px",
"year of birth" = TRUE,
sex = c("M", "F"),
"Triangles(Lexis)" = c("0", "1"),
event = TRUE,
year = TRUE # top3 : 3i instead of TRUE
)
data_df_PxWebApiData <- data[[1]] # Extract the first list element, which contains full variable names.
head(data_df_PxWebApiData,5)
## year of birth event sex Triangles(Lexis) year value
## 1 1885 Population (start of year) Males Upper 1985 1
## 2 1885 Population (start of year) Males Upper 1986 1
## 3 1885 Population (start of year) Males Upper 1987 0
## 4 1885 Population (start of year) Males Upper 1988 0
## 5 1885 Population (start of year) Males Upper 1989 0
General interface to all pxweb based Statbanks.
Last updated 2021-01-04
the statgl-package bundles pxweb-based statbank functionality with presentation features, used by Statistics Greenland.
In example 3 the statgl package is used to get data from statbank.hagstova.fo and also 2 Greenlandic example to show additional features
#====================================================================================
# Example 3: statgl (GitHub)
# Statistics Greenland - https://github.com/StatisticsGreenland/statgl
#====================================================================================
library(statgl)
#statgl_search("Population")
data_df_statgl <- statgl_fetch("https://statbank.hagstova.fo:443/api/v1/en/H2/DEV/COH/Lexis.px",
"year of birth" = px_all(),
sex = c("M", "F"),
"Triangles(Lexis)" = c("0", "1"),
event = px_all(),
year = px_all(), # px_top(3)
.val_code=TRUE)
CONST_statbank <- "https://bank.stat.gl/api/v1/en/Greenland"
statgl_url("BEXCALCR", api_url = CONST_statbank) %>%
statgl_fetch(area = c("NUK"),
event = c("P"),
gender = px_all(),
time = px_top(7),
.eliminate_rest = TRUE ,
.col_code = TRUE,
.val_code = FALSE
) %>%
select(-event) %>%
pivot_wider(names_from = time,values_from = value) %>%
statgl_table()
area | gender | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
---|---|---|---|---|---|---|---|---|
|
Total | 16.992 | 17.316 | 17.600 | 17.796 | 17.984 | 18.326 | 18.800 |
|
Female | 8.070 | 8.183 | 8.334 | 8.437 | 8.533 | 8.703 | 8.903 |
|
Male | 8.922 | 9.133 | 9.266 | 9.359 | 9.451 | 9.623 | 9.897 |
# https://stat.gl/dialog/mainTheme.asp?lang=en&sc=VM&tname=t1
library(lubridate)
# Import
INXIU101_raw <-
statgl_url("INXIU101", lang = "en") %>%
statgl_fetch(
indicator = 2:4,
time = px_all(),
.col_code = TRUE
) %>%
as_tibble()
# Transform
INXIU101 <-
INXIU101_raw %>%
mutate(
time = time %>% make_date(),
indicator = indicator %>% as.factor() %>% fct_rev()
)
# Plot
INXIU101 %>%
ggplot(aes(
x = time,
y = value,
fill = indicator
)) +
geom_area(position = "identity") +
scale_y_continuous(labels = scales::percent_format(
scale = 1,
accuracy = 1.1,
big.mark = ".",
decimal.mark = ","
)) +
theme_statgl() +
scale_fill_statgl(reverse = TRUE, guide = guide_legend(reverse = TRUE)) +
labs(
title = "At-risk-of-povery rate",
x = " ",
y = " "
)
Statbank Denmark specific
#====================================================================================
# Example 4: dkstat
# https://github.com/rOpenGov/dkstat
#====================================================================================
library(dkstat)
dkstat::dst_search("Population", lang="en") %>% head(5)
## id text unit updated
## 1 FOLK1A Population at the first day of the quarter Number 2021-08-11T08:00:00
## 2 FOLK3 Population 1. January Number 2021-02-11T08:00:00
## 3 BEF5 Population 1. January Number 2021-02-11T08:00:00
## 4 FT Population figures from the censuses Number 2021-02-11T08:00:00
## 5 BY1 Population 1. January Number 2021-04-29T08:00:00
## firstPeriod latestPeriod active
## 1 2008Q1 2021Q3 TRUE
## 2 2008 2021 TRUE
## 3 1990 2021 TRUE
## 4 1769 2021 TRUE
## 5 2010 2021 TRUE
## variables
## 1 region, sex, age, marital status, time
## 2 day of birth, birth month, year of birth, time
## 3 sex, age, country of birth, time
## 4 national part, time
## 5 urban and rural areas, age, sex, time
folk1a_meta <- dst_meta("folk1a", lang = "da")
folk1a_meta[[1]]
## $id
## [1] "FOLK1A"
##
## $text
## [1] "Folketal den 1. i kvartalet"
##
## $description
## [1] "Folketal den 1. i kvartalet efter område, køn, alder, civilstand og tid"
##
## $unit
## [1] "Antal"
##
## $updated
## [1] "2021-08-11T08:00:00"
##
## $footnote
## NULL
folk1a_meta[[2]]
## id text elimination
## 1 OMRÅDE område TRUE
## 2 KØN køn TRUE
## 3 ALDER alder TRUE
## 4 CIVILSTAND civilstand TRUE
## 5 Tid tid FALSE
folk1a_meta[[3]]$KØN
## id text
## 1 TOT I alt
## 2 1 Mænd
## 3 2 Kvinder
data_df_dkstat <- str(dst_get_data(table = "folk1a",
OMRÅDE = c("Hele landet", "København", "Frederiksberg", "Odense"),
CIVILSTAND = "*",
TID = "*",
lang = "da",
meta_data = folk1a_meta))
## 'data.frame': 1100 obs. of 4 variables:
## $ OMRÅDE : chr "Hele landet" "Hele landet" "Hele landet" "Hele landet" ...
## $ CIVILSTAND: chr "I alt" "I alt" "I alt" "I alt" ...
## $ TID : Date, format: "2008-01-01" "2008-04-01" ...
## $ value : int 5475791 5482266 5489022 5505995 5511451 5515287 5519441 5532531 5534738 5540241 ...
Statbank Ireland specific
#====================================================================================
# Example 5: csodata (cran)
# Conor Crowley <conor.crowley at cso.ie>
#====================================================================================
library(csodata)
toc <- cso_get_toc()
head(toc)
## LastModified
## 1 2021-10-20 00:20:00
## 2 2021-10-20 00:20:00
## 3 2021-10-20 00:20:00
## 4 2021-10-20 00:20:00
## 5 2021-10-20 00:20:00
## 6 2021-10-20 00:20:00
## title id
## 1 Roscommon Population by Private Households, Occupied and Vacancy Rate CD168
## 2 Sligo Population by Private Households, Occupied and Vacancy Rate CD169
## 3 Tipperary Population by Private Households, Occupied and Vacancy Rate CD170
## 4 Waterford Population by Private Households, Occupied and Vacancy Rate CD171
## 5 Westmeath Population by Private Households, Occupied and Vacancy Rate CD172
## 6 Wexford Population by Private Households, Occupied and Vacancy Rate CD173
population <- cso_search_toc("Population")
tbl1 <- cso_get_data("PEB07")
meta1 <- cso_get_meta("PEA19") %>% as_tibble()
cso_disp_meta("PEA19")
## [1] "Year" "Sex" "Education Level"
## [4] "Inward or Outward Flow"
## [1] "Estimated Migration Aged 15 Years and Over (Persons in April)"
# data_df_cso <- statgl_fetch(url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.PxAPIv1/en/17/PME/PEA21",
# Year = px_all(),
# sex = px_all(),
# Nationality = px_all(),
# .val_code=TRUE)
#
data_df_cso <- pxweb_get_data(url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.PxAPIv1/en/17/PME/PEA21",
query = list(Year = "*",
sex = "*",
Nationality = "*"))
## Error:
## This is not a PXWEB API:
## https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.PxAPIv1/en/17/PME/PEA21
Read PX-files to R
Last updated 2020-06-07
#====================================================================================
# Example 6: pxR (cran)
# Carlos J. Gil Bellosta <cgb at datanalytics.com>
#====================================================================================
library(pxR)
# Read px-files
# library(pxR)
#
# Reading PC-Axis files into R
# Function read.px reads a PC-Axis file from a given location and returns an object of class px containing all the data and metadata in the original PC-Axis file.
#
# The single most important piece of information within a pxobject is the data matrix, which can be extracted into a R data.frame using function as.data.frame. For instance,
#
# my.px.object <- read.px("/path/to/pc-axis/file")
# my.px.data <- as.data.frame(my.px.object)
# will create the data.frame my.px.data with the data in the corresponding PC-Axis file.